Graphics Fundamentals

Raphael Aggio

2016

Today’s content

  • Graphical excellence
  • Grammar of graphics
  • What’s coming up in the hands on sessions
  • Further reading

What are graphics for?

Comprehend this:

data(anscombe)
anscombe[ , c(1,5,2,6,3,7,4,8)]
##    x1    y1 x2   y2 x3    y3 x4    y4
## 1  10  8.04 10 9.14 10  7.46  8  6.58
## 2   8  6.95  8 8.14  8  6.77  8  5.76
## 3  13  7.58 13 8.74 13 12.74  8  7.71
## 4   9  8.81  9 8.77  9  7.11  8  8.84
## 5  11  8.33 11 9.26 11  7.81  8  8.47
## 6  14  9.96 14 8.10 14  8.84  8  7.04
## 7   6  7.24  6 6.13  6  6.08  8  5.25
## 8   4  4.26  4 3.10  4  5.39 19 12.50
## 9  12 10.84 12 9.13 12  8.15  8  5.56
## 10  7  4.82  7 7.26  7  6.42  8  7.91
## 11  5  5.68  5 4.74  5  5.73  8  6.89

compared to:

Put the data in its place

Use during analysis

present results

Where graphics fit in?

Different purposes

…exploratory…

…analysis and diagnosis…

…presentation…

diagram

From http://r4ds.had.co.nz/index.html

Key points in common:

  • Comparative
  • Multivariate
  • High data density
  • Reveal interactions and comparisons
  • Nearly all the ink is data ink

One continuous dimension

One dimension

‘Compared to what?’

Or

Or

Not just:

Show the data if you can

Two continuous dimensions

“…the relational graphic - in its barest form, the scatterplot and its variants - is the greatest of all graphical designs. It links at least two variables, encouraging and even imploring the viewer to assess the possible causal relationship between the plotted variables.”

Edward Tufte

The humble scatterplot

Add a third dimension

And more

Time element…

… for general users

Graphical Excellence

  • well-designed presentation of interesting data - substance, statistics, and design
  • complex ideas communicated with clarity, precision, and efficiency
  • greatest number of ideas in the shortest time with the least ink in the smallest space
  • nearly always multivariate
  • telling the truth about the data

Adapted from Tufte

Improvement example 1

Cluttered

Minimal axis guides

Fade axis title

Remove borders

Remove boxes

Guidelines to back

Background to back

Consistent doc theme

Consistent font

Corporate colours

Direct labels

Much better than:

Improvement example 2

Original

User-friendly labels

Horizontal text

Meaningful ordering

Better shape and geom

Labels on points

Title and annotation

Another dimension

Better than:

Improvement example 3

Difficult

Use cartesian coordinates

Use height

Flip for readability

Sequence

Maximise focus on data

Labels near the data

Use like a table

Better than

Statistical transformations

Not just this

But this

Or this

Formats

Raster

  • pixel information as used by cameras
  • .png (preferred) or .jpg
  • use for inserting into Microsoft Office applications
  • width and height measured in pixels
  • resolution defaults to 72 dots per inch (dpi)
  • print quality is around 600 dpi; 720 preferred
  • loses resolution when zooming in
  • relatively efficient for very complex images (eg many thousands of dots and lines)

Vectors

  • drawing instructions as used by design software
  • .svg (works in modern web browsers) or .pdf (doesn’t work in-line in a web page)
  • use for working with designers or modern web page
  • width and height measured in inches (or centimetres)
  • resolution not relevant
  • high quality; no loss when zooming in
  • relatively efficient for simple images

Grammar of graphics

A plot is

  • a default dataset
  • mappings from variables to aesthetics
  • one or more layers
  • one scale for each aesthetic mapping
  • a coordinate system
  • a faceting specification
  • thematic look and feel

Each layer is

  • (maybe) its own dataset
  • (maybe) its own mapping from variables to aesthetics
  • a statistical transformation (maybe ‘identity’)
  • a geometric object (line, point, text, etc)

Plots include

  • density plots
  • boxplots
  • barplots
  • dotplots
  • line charts
  • scatter plots
  • bubble graphs
  • maps
  • or something more complicated

But everything can be seen as a combination of mappings, layers, stats, geoms, scales, coordinates, facets

Improvement example 1

Improvement example 1

# Preparing the input data...
graphData <- toPlotData %>%
    ungroup() %>%
    select(Data1, Data2, Area, Year) %>%
    gather(variable, Value, -Year, -Area) %>%
    mutate(variable = ifelse(variable == "Data1", "Annual GDP growth rates", "Annual percentage changes of new dwelling approvals"))

# Preparing labels...
labs <- data.frame(
    Year = c(2009, 2006),
    Area = c("West Coast", "West Coast"),
    variable = c("Annual percentage changes of new dwelling approvals", "Annual GDP growth rates"),
    Value = c(-34, 24)
)

# specify default data and mappings to aesthetics:
ggplot(data = graphData, aes(x = Year, y = Value / 100, colour = variable)) +
    # Layer 1 uses default data and mappings, and line geom:
    geom_line() +
    # Layer 2 uses its own data, adds an aesthetic, and text geom:
    geom_text(data = labs, aes(label = variable), family = "Calibri", 
          hjust = 0.5, size = 3) +
    # Facets for overall plot   
    facet_wrap(~Area) +
    # scale for the y and colour aesthetics
    scale_y_continuous("Growth per year", label = percent) +
    scale_colour_manual(values = mbie::mbie.cols(c(1, 3))) +
    # titles and guides    
    labs(x = "") +
    ggtitle("Economic growth and change of new dwelling approvals, selected NZ regions\n") +
    # theme for look and feel
    theme_solarized(base_family = "Calibri") +
    theme(panel.background = element_blank()) +
    theme(legend.position = "none",
          strip.background = element_rect(colour = NA, fill = stripcolour))

What next (subject to change…)

Session 2

  • Layered grammar of graphics in action
  • Dodged, stacked, and filled bar charts
  • Controlling scales and labels
  • Setting colours
  • Setting a theme
  • Fonts
  • Scatter plots and labelling point
  • Controlling the order of levels in a scale

Session 3

  • Univariate plots - points, rugs, density, boxplots
  • Direct labelling
  • Transparency and fill
  • Scale transformations
  • More about choosing colours
  • Dot plots
  • Turning a scatter plot into a bubble plot
  • Marginal distributions added to scatter plots
  • Using facets to make “small multiples”

Session 4

  • Some more specialised geoms - segments, rect, contour, tile
  • More on annotation
  • Using SVG with a designer
  • Combining multiple plots
  • More on polishing in general

Session 5 - maps

  • Maps as a variety of scatterplot
  • Importing map backgrounds (satellite, terrain, road maps) from the internet
  • Rendering geospatial areas as polygons
  • Accessing maps of New Zealand via the mbiemaps package
  • Merging statistical data with map data
  • Choropleth maps
  • Additional layers on maps
  • Facets and maps
  • Specific issues with polishing maps to do with colour, themes, backgrounds, coordinates

Read more

Remember

  • Comparative
  • Multivariate
  • High data density
  • Reveal interactions and comparisons
  • Nearly all the ink is data ink